factored markov decision process
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.
Review for NeurIPS paper: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
Additional Feedback: Response to author feedback: From the informal discussion about the cross-component counters, I'm getting that it's somehow bad if different components have been explored unevenly and therefore encouraging more balanced exploration (pairwise) reduces overall variance in the amount of exploration between components. I'm sure there's a lot I'm not getting, but that helps a bit. I think it should be the case that you recover an object when you multiply its factors together (for the appropriate definition of "multiply"). There are papers (well, just one I can think of) that deal with truly factored MDPs that are the product of simpler MDPs. They correctly call their MDPs factored.
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
Review for NeurIPS paper: Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
While this paper initially had some mild divergence of opinion among the reviewers, after the author response and some detailed discussion, it was agreed that this paper makes a solid contribution (please see the revised reviews). It is certainly is of relevance to NeuRIPS. After discussion, there was agreement on the significance of the conceptual contribution, namely the treatment of the cross-component bonuses. Several reviewers note that the mathematics is fairly "standard" (Bernstein-bound machinery), though in the end that should not be considered a drawback. At least one reviewer notes that the 31pp appendix means that it is not possible to verify the mathematical results during the review period.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes
We study minimax optimal reinforcement learning in episodic factored Markov decision processes (FMDPs), which are MDPs with conditionally independent transition components. Assuming the factorization is known, we propose two model-based algorithms. The first one achieves minimax optimal regret guarantees for a rich class of factored structures, while the second one enjoys better computational complexity with a slightly worse regret. A key new ingredient of our algorithms is the design of a bonus term to guide exploration. We complement our algorithms by presenting several structure dependent lower bounds on regret for FMDPs that reveal the difficulty hiding in the intricacy of the structures.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Open Questions for Building Optimal Operation Policies for Dam Management Using Factored Markov Decision Processes
Reyes, Alberto (Instituto de Investigaciones Electricas) | Ibarguengoytia, Pablo H. (Instituto de Investigaciones Electricas) | Romero, Inés (Instituto de Investigaciones Electricas) | Pech, David (Instituto de Investigaciones Electricas) | Borunda, Mónica (Instituto de Investigaciones Electricas)
In this paper, we present the conceptual model of a realworld application of Markov Decision Processes to dam management. The idea is to demonstrate that it is possible to efficiently automate the construction of operation policies by modelling the problem as a sequential decision problem that can be easily solved using stochastic dynamic programming. We will explain the problem domain and provide an analysis of the resulting value and policy functions. We will also present a useful discussion about the issues that will appear when the conceptual model to be extended into a real-world application.
- North America > United States > California > San Mateo County > Menlo Park (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Ecuador (0.05)
- (4 more...)
- Information Technology > Decision Support Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)